{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\nOptimizing Operators with Auto-scheduling\n=========================================\n**Author**: `Lianmin Zheng <https://github.com/merrymercy>`_,             `Chengfan Jia <https://github.com/jcf94/>`_\n\nIn this tutorial, we will show how TVM's Auto Scheduling feature can find\noptimal schedules without the need for writing a custom template.\n\nDifferent from the template-based :doc:`AutoTVM <autotvm_matmul_x86>` which relies on\nmanual templates to define the search space, the auto-scheduler does not\nrequire any templates.  Users only need to write the computation declaration\nwithout any schedule commands or templates.  The auto-scheduler can\nautomatically generate a large search space and find a good schedule in the\nspace.\n\nWe use matrix multiplication as an example in this tutorial.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>Note that this tutorial will not run on Windows or recent versions of macOS. To\n  get it to run, you will need to wrap the body of this tutorial in a :code:`if\n  __name__ == \"__main__\":` block.</p></div>\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import os\n\nimport numpy as np\nimport tvm\nfrom tvm import te, auto_scheduler"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Defining the Matrix Multiplication\n----------------------------------\nTo start, we define a matrix multiplication with a bias addition.  Note that\nthis uses standard operations available in TVMs Tensor Expression language.\nThe major difference is the use of the `auto_sceduler` decorator at the top\nof the function definition.  The function should return a list of\ninput/output tensors.  From these tensors, the auto-scheduler can get the\nwhole computational graph.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "@auto_scheduler.register_workload  # Note the auto_scheduler decorator\ndef matmul_add(N, L, M, dtype):\n    A = te.placeholder((N, L), name=\"A\", dtype=dtype)\n    B = te.placeholder((L, M), name=\"B\", dtype=dtype)\n    C = te.placeholder((N, M), name=\"C\", dtype=dtype)\n\n    k = te.reduce_axis((0, L), name=\"k\")\n    matmul = te.compute(\n        (N, M),\n        lambda i, j: te.sum(A[i, k] * B[k, j], axis=k),\n        name=\"matmul\",\n        attrs={\"layout_free_placeholders\": [B]},  # enable automatic layout transform for tensor B\n    )\n    out = te.compute((N, M), lambda i, j: matmul[i, j] + C[i, j], name=\"out\")\n\n    return [A, B, C, out]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Create the search task\n----------------------\nWith the function defined, we can now create the task for the auto_scheduler\nto search against. We specify the particular parameters for this matrix\nmultiplication, in this case a multiplication of to square matricies of size\n1024x1024. We then create a search task with N=L=M=1024 and dtype=\"float32\"\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>Improve performance with custom targets\n  In order for TVM to take full advantage of specific hardware platforms,\n  you will want to manuall specify your CPU capabilities. For example:\n  - replace \"llvm\" below with \"llvm -mcpu=core-avx2\" to enable AVX2\n  - replace \"llvm\" below with \"llvm -mcpu=skylake-avx512\" to enable AVX-512</p></div>\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "target = tvm.target.Target(\"llvm\")\nN = L = M = 1024\ntask = tvm.auto_scheduler.SearchTask(func=matmul_add, args=(N, L, M, \"float32\"), target=target)\n\n# Inspect the computational graph\nprint(\"Computational DAG:\")\nprint(task.compute_dag)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Set Parameters for Auto-Scheduler\n---------------------------------\nNext, we set parameters for the auto-scheduler.\n\n* :code:`num_measure_trials` is the number of measurement trials we can use\n  during the search.  We only make 10 trials in this tutorial for a fast\n  demonstration. In practice, 1000 is a good value for the search to converge.\n  You can do more trials according to your time budget.\n* In addition, we use :code:`RecordToFile` to log measurement records into a\n  file `matmul.json`.  The measurement records can be used to query the history\n  best, resume the search, and do more analyses later.\n* see :any:`auto_scheduler.TuningOptions` for more parameters\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "log_file = \"matmul.json\"\ntune_option = auto_scheduler.TuningOptions(\n    num_measure_trials=10,\n    measure_callbacks=[auto_scheduler.RecordToFile(log_file)],\n    verbose=2,\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Run the search\n--------------\nNow we get all inputs ready. Pretty simple, isn't it?  We can kick off the\nsearch and let the auto-scheduler do its magic.  After some measurement\ntrials, we can load the best schedule from the log file and apply it.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Run auto-tuning (search)\ntask.tune(tune_option)\n# Apply the best schedule\nsch, args = task.apply_best(log_file)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Inspecting the Optimized Schedule\n---------------------------------\nWe can lower the schedule to see the IR after auto-scheduling.  The\nauto-scheduler correctly performs optimizations including multi-level tiling,\nlayout transformation, parallelization, vectorization, unrolling, and\noperator fusion.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"Lowered TIR:\")\nprint(tvm.lower(sch, args, simple_mode=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Check correctness and evaluate performance\n------------------------------------------\nWe build the binary and check its correctness and performance.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "func = tvm.build(sch, args, target)\na_np = np.random.uniform(size=(N, L)).astype(np.float32)\nb_np = np.random.uniform(size=(L, M)).astype(np.float32)\nc_np = np.random.uniform(size=(N, M)).astype(np.float32)\nout_np = a_np.dot(b_np) + c_np\n\ndev = tvm.cpu()\na_tvm = tvm.nd.array(a_np, device=dev)\nb_tvm = tvm.nd.array(b_np, device=dev)\nc_tvm = tvm.nd.array(c_np, device=dev)\nout_tvm = tvm.nd.empty(out_np.shape, device=dev)\nfunc(a_tvm, b_tvm, c_tvm, out_tvm)\n\n# Check results\nnp.testing.assert_allclose(out_np, out_tvm.numpy(), rtol=1e-3)\n\n# Evaluate execution time.\nevaluator = func.time_evaluator(func.entry_name, dev, min_repeat_ms=500)\nprint(\n    \"Execution time of this operator: %.3f ms\"\n    % (np.median(evaluator(a_tvm, b_tvm, c_tvm, out_tvm).results) * 1000)\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Using the record file\n---------------------\nDuring the search, all measurement records are logged into the record file\n\"matmul.json\". The measurement records can be used to re-apply search\nresults, resume the search, and perform other analyses.\n\nHere is an example where we load the best schedule from a file, and print the\nequivalent python schedule API. This can be used for debugging and learning\nthe behavior of the auto-scheduler.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"Equivalent python schedule:\")\nprint(task.print_best(log_file))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "A more complicated example is to resume the search.  In this case, we need to\ncreate the search policy and cost model by ourselves and resume the status of\nsearch policy and cost model with the log file.  In the example below we\nresume the status and do more 5 trials.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def resume_search(task, log_file):\n    print(\"Resume search:\")\n    cost_model = auto_scheduler.XGBModel()\n    cost_model.update_from_file(log_file)\n    search_policy = auto_scheduler.SketchPolicy(\n        task, cost_model, init_search_callbacks=[auto_scheduler.PreloadMeasuredStates(log_file)]\n    )\n    tune_option = auto_scheduler.TuningOptions(\n        num_measure_trials=5, measure_callbacks=[auto_scheduler.RecordToFile(log_file)]\n    )\n    task.tune(tune_option, search_policy=search_policy)\n\n\nresume_search(task, log_file)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Final Notes and Summary\n-----------------------\nIn this tutorial, we have shown how to use the TVM Auto-Scheduler to\nautomatically optimize a matrix multiplication, without the need to specify a\nsearch template.  It ends a series of examples that starts from the Tensor\nExpression (TE) language that demonstrates how TVM can optimize computational\noperations.\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}